{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# [ Chapter 12 - Overcoming Bias in Learned Relevance Models ]\n",
    "\n",
    "## Chapter 12 setup\n",
    "\n",
    "In chapter 12, we continue our work on a Learning to Rank solution. Evolving from a purely offline use of click-based training data to trying to explore potentially relevant items the users may find valuable. \n",
    "\n",
    "To setup, we\n",
    "\n",
    "1. Fetch the retrotech data\n",
    "2. Enable LTR\n",
    "3. Define a few fields (different ways of analyzing the underlying retrotech text)\n",
    "4. Define a list of 'promoted' products that our store wants to make prominent\n",
    "5. Insert the retrotech product data via spark\n",
    "\n",
    "\n",
    "### [TODO: remove the product / signals cells, as those were loaded in ch4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('../..')\n",
    "from aips import *\n",
    "from pyspark.sql import SparkSession\n",
    "spark = SparkSession.builder.appName(\"AIPS\").getOrCreate()\n",
    "engine = get_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Already up to date.\n",
      "products.csv\n",
      "signals.csv\n"
     ]
    }
   ],
   "source": [
    "#Get datasets\n",
    "![ ! -d 'retrotech' ] && git clone --depth=1 https://github.com/ai-powered-search/retrotech.git\n",
    "! cd retrotech && git pull\n",
    "! cd retrotech && tar -xvf products.tgz -C '../data/retrotech/' && tar -xvf signals.tgz -C '../data/retrotech/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\"upc\",\"name\",\"manufacturer\",\"short_description\",\"long_description\"\n",
      "\"096009010836\",\"Fists of Bruce Lee - Dolby - DVD\", , , \n",
      "\"043396061965\",\"The Professional - Widescreen Uncut - DVD\", , , \n",
      "\"085391862024\",\"Pokemon the Movie: 2000 - DVD\", , , \n",
      "\"067003016025\",\"Summerbreeze - CD\",\"Nettwerk\", , \n",
      "\"731454813822\",\"Back for the First Time [PA] - CD\",\"Def Jam South\", , \n",
      "\"024543008200\",\"Big Momma's House - Widescreen - DVD\", , , \n",
      "\"031398751823\",\"Kids - DVD\", , , \n",
      "\"037628413929\",\"20 Grandes Exitos - CD\",\"Sony Discos Inc.\", , \n",
      "\"060768972223\",\"Power Of Trinity (Box) - CD\",\"Sanctuary Records\", , \n"
     ]
    }
   ],
   "source": [
    "! cd data/retrotech/ && head products.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wiping \"products\" collection\n",
      "Creating \"products\" collection\n",
      "Status: Success\n",
      "Adding LTR QParser for products collection\n",
      "Adding LTR Doc Transformer for products collection\n",
      "Loading Products\n",
      "Schema: \n",
      "root\n",
      " |-- upc: string (nullable = true)\n",
      " |-- name: string (nullable = true)\n",
      " |-- manufacturer: string (nullable = true)\n",
      " |-- short_description: string (nullable = true)\n",
      " |-- long_description: string (nullable = true)\n",
      "\n",
      "Successfully written 48194 documents\n"
     ]
    }
   ],
   "source": [
    "from aips.data_loaders.products import load_dataframe\n",
    "\n",
    "#Create Products Collection\n",
    "products_collection = engine.create_collection(\"products\")\n",
    "get_ltr_engine(products_collection).enable_ltr()\n",
    "promoted = [600603141003, 27242813908, 74108007469,\n",
    "            12505525766, 400192926087, 47875842328, \n",
    "            803238004525, 27242799127, 27242815414,\n",
    "            97360724240, 97360722345, 826663114164]\n",
    "\n",
    "promoted = [{\"upc\": promoted_upc, \"has_promotion\": True}\n",
    "            for promoted_upc in promoted]\n",
    "promoted_dataframe = spark.createDataFrame(promoted)\n",
    "\n",
    "products_dataframe = load_dataframe(\"data/retrotech/products.csv\")\n",
    "enriched_data = products_dataframe.join(promoted_dataframe, [\"upc\"], \"left\")\n",
    "products_collection.write(enriched_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'upc': '97360722345', 'name': 'Transformers/Transformers: Revenge of the Fallen: Two-Movie Mega Collection [2 Discs] - Widescreen - DVD', 'manufacturer': ' ', 'score': 3.3835273}, {'upc': '97360724240', 'name': 'Transformers: Revenge of the Fallen - Widescreen - DVD', 'manufacturer': ' ', 'score': 3.1457326}, {'upc': '400192926087', 'name': 'Transformers: Dark of the Moon - Original Soundtrack - CD', 'manufacturer': 'Reprise', 'score': 2.9793549}, {'upc': '47875842328', 'name': 'Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii', 'manufacturer': 'Activision', 'score': 2.851606}, {'upc': '826663114164', 'name': 'Transformers: The Complete Series [25th Anniversary Matrix of Leadership Edition] [16 Discs] - DVD', 'manufacturer': ' ', 'score': 2.356245}]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div id=\"demo\">\n",
       "\t<center><input style=\"width:40%\" readonly type=\"text\" name=\"q\" value=\"Transformers\">\n",
       "\t<input readonly type=\"submit\" value=\"Search\">\n",
       "\t</center>\n",
       "    <div class=\"results\">\n",
       "    \t\n",
       "\n",
       "    </div>\n",
       "</div>\n",
       "    \t<div style=\"position: relative; width: 100%; height:auto; overflow: auto;\">\n",
       "\t    \t<div style=\"position: relative; float:left; width: 120px; margin-top:5px\">\n",
       "\t    \t\t<img style=\"width:250px; height: auto; max-height:150px\" src=\"../../data/retrotech/images/97360722345.jpg\">\n",
       "\t    \t</div>\n",
       "\t    \t<div style=\"position:relative; float:left; clear:none; width: 80%; height:auto\">\n",
       "\t    \t\t<p style=\"font-size:24px; padding-left: 50px;\"><strong>\t\tName:</strong> Transformers/Transformers: Revenge of the Fallen: Two-Movie Mega Collection [2 Discs] - Widescreen - DVD\n",
       "\t\t\t\t\t<br><strong>\t\tManufacturer:</strong>  </p>\n",
       "\t    \t\t</p>\n",
       "\t    \t</div>\n",
       "    \t</div>\n",
       "    \t\n",
       "\t\t<div style=\"position:relative; clear:both; content: ' '; display: block; height: 1px; margin-top: 10px; margin-bottom:20px\">\n",
       "\t\t\t<hr style=\"color: gray; width: 95%;\" />\n",
       "\t\t</div>\n",
       "\t\t\n",
       "    \t<div style=\"position: relative; width: 100%; height:auto; overflow: auto;\">\n",
       "\t    \t<div style=\"position: relative; float:left; width: 120px; margin-top:5px\">\n",
       "\t    \t\t<img style=\"width:250px; height: auto; max-height:150px\" src=\"../../data/retrotech/images/97360724240.jpg\">\n",
       "\t    \t</div>\n",
       "\t    \t<div style=\"position:relative; float:left; clear:none; width: 80%; height:auto\">\n",
       "\t    \t\t<p style=\"font-size:24px; padding-left: 50px;\"><strong>\t\tName:</strong> Transformers: Revenge of the Fallen - Widescreen - DVD\n",
       "\t\t\t\t\t<br><strong>\t\tManufacturer:</strong>  </p>\n",
       "\t    \t\t</p>\n",
       "\t    \t</div>\n",
       "    \t</div>\n",
       "    \t\n",
       "\t\t<div style=\"position:relative; clear:both; content: ' '; display: block; height: 1px; margin-top: 10px; margin-bottom:20px\">\n",
       "\t\t\t<hr style=\"color: gray; width: 95%;\" />\n",
       "\t\t</div>\n",
       "\t\t\n",
       "    \t<div style=\"position: relative; width: 100%; height:auto; overflow: auto;\">\n",
       "\t    \t<div style=\"position: relative; float:left; width: 120px; margin-top:5px\">\n",
       "\t    \t\t<img style=\"width:250px; height: auto; max-height:150px\" src=\"../../data/retrotech/images/400192926087.jpg\">\n",
       "\t    \t</div>\n",
       "\t    \t<div style=\"position:relative; float:left; clear:none; width: 80%; height:auto\">\n",
       "\t    \t\t<p style=\"font-size:24px; padding-left: 50px;\"><strong>\t\tName:</strong> Transformers: Dark of the Moon - Original Soundtrack - CD\n",
       "\t\t\t\t\t<br><strong>\t\tManufacturer:</strong> Reprise</p>\n",
       "\t    \t\t</p>\n",
       "\t    \t</div>\n",
       "    \t</div>\n",
       "    \t\n",
       "\t\t<div style=\"position:relative; clear:both; content: ' '; display: block; height: 1px; margin-top: 10px; margin-bottom:20px\">\n",
       "\t\t\t<hr style=\"color: gray; width: 95%;\" />\n",
       "\t\t</div>\n",
       "\t\t\n",
       "    \t<div style=\"position: relative; width: 100%; height:auto; overflow: auto;\">\n",
       "\t    \t<div style=\"position: relative; float:left; width: 120px; margin-top:5px\">\n",
       "\t    \t\t<img style=\"width:250px; height: auto; max-height:150px\" src=\"../../data/retrotech/images/47875842328.jpg\">\n",
       "\t    \t</div>\n",
       "\t    \t<div style=\"position:relative; float:left; clear:none; width: 80%; height:auto\">\n",
       "\t    \t\t<p style=\"font-size:24px; padding-left: 50px;\"><strong>\t\tName:</strong> Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii\n",
       "\t\t\t\t\t<br><strong>\t\tManufacturer:</strong> Activision</p>\n",
       "\t    \t\t</p>\n",
       "\t    \t</div>\n",
       "    \t</div>\n",
       "    \t\n",
       "\t\t<div style=\"position:relative; clear:both; content: ' '; display: block; height: 1px; margin-top: 10px; margin-bottom:20px\">\n",
       "\t\t\t<hr style=\"color: gray; width: 95%;\" />\n",
       "\t\t</div>\n",
       "\t\t\n",
       "    \t<div style=\"position: relative; width: 100%; height:auto; overflow: auto;\">\n",
       "\t    \t<div style=\"position: relative; float:left; width: 120px; margin-top:5px\">\n",
       "\t    \t\t<img style=\"width:250px; height: auto; max-height:150px\" src=\"../../data/retrotech/images/826663114164.jpg\">\n",
       "\t    \t</div>\n",
       "\t    \t<div style=\"position:relative; float:left; clear:none; width: 80%; height:auto\">\n",
       "\t    \t\t<p style=\"font-size:24px; padding-left: 50px;\"><strong>\t\tName:</strong> Transformers: The Complete Series [25th Anniversary Matrix of Leadership Edition] [16 Discs] - DVD\n",
       "\t\t\t\t\t<br><strong>\t\tManufacturer:</strong>  </p>\n",
       "\t    \t\t</p>\n",
       "\t    \t</div>\n",
       "    \t</div>\n",
       "    \t\n",
       "\t\t<div style=\"position:relative; clear:both; content: ' '; display: block; height: 1px; margin-top: 10px; margin-bottom:20px\">\n",
       "\t\t\t<hr style=\"color: gray; width: 95%;\" />\n",
       "\t\t</div>\n",
       "\t\t"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "query = \"Transformers\"\n",
    "\n",
    "collection = \"products\"\n",
    "request = {\n",
    "    \"query\": query,\n",
    "    \"query_fields\": [\"name\", \"manufacturer\", \"long_description\"],\n",
    "    \"return_fields\": [\"upc\", \"name\", \"manufacturer\", \"score\"],\n",
    "    \"filters\": [(\"has_promotion\", True)],\n",
    "    \"limit\": 5,\n",
    "    \"order_by\": [(\"score\", \"desc\"), (\"upc\", \"asc\")]\n",
    "}\n",
    "\n",
    "response = products_collection.search(**request)\n",
    "print(response[\"docs\"])\n",
    "display_product_search(query, response[\"docs\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\"query_id\",\"user\",\"type\",\"target\",\"signal_time\"\n",
      "\"u2_0_1\",\"u2\",\"query\",\"nook\",\"2019-07-31 08:49:07.3116\"\n",
      "\"u2_1_2\",\"u2\",\"query\",\"rca\",\"2020-05-04 08:28:21.1848\"\n",
      "\"u3_0_1\",\"u3\",\"query\",\"macbook\",\"2019-12-22 00:07:07.0152\"\n",
      "\"u4_0_1\",\"u4\",\"query\",\"Tv antenna\",\"2019-08-22 23:45:54.1030\"\n",
      "\"u5_0_1\",\"u5\",\"query\",\"AC power cord\",\"2019-10-20 08:27:00.1600\"\n",
      "\"u6_0_1\",\"u6\",\"query\",\"Watch The Throne\",\"2019-09-18 11:59:53.7470\"\n",
      "\"u7_0_1\",\"u7\",\"query\",\"Camcorder\",\"2020-02-25 13:02:29.3089\"\n",
      "\"u9_0_1\",\"u9\",\"query\",\"wireless headphones\",\"2020-04-26 04:26:09.7198\"\n",
      "\"u10_0_1\",\"u10\",\"query\",\"Xbox\",\"2019-09-13 16:26:12.0132\"\n"
     ]
    }
   ],
   "source": [
    "! cd data/retrotech && head signals.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download query sessions\n",
    "\n",
    "Download simulated raw clickstream data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/dryer_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/bluray_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/blue ray_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/headphones_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/ipad_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/iphone_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/kindle_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/lcd tv_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/macbook_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/nook_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/star trek_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/star wars_sessions.gz\n",
      "GET https://github.com/ai-powered-search/retrotech/raw/master/sessions/transformers dark of the moon_sessions.gz\n"
     ]
    }
   ],
   "source": [
    "from ltr import download\n",
    "simulated_queries = [\"dryer\", \"bluray\", \"blue ray\", \"headphones\", \"ipad\", \"iphone\",\n",
    "                     \"kindle\", \"lcd tv\", \"macbook\", \"nook\", \"star trek\", \"star wars\",\n",
    "                     \"transformers dark of the moon\"]\n",
    "\n",
    "sessions = [f\"https://github.com/ai-powered-search/retrotech/raw/master/sessions/{query}_sessions.gz\"\n",
    "            for query in simulated_queries]\n",
    "           \n",
    "download(sessions, dest=\"data/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " ai_pow_search_judgments.txt  'lcd tv_sessions.gz'\n",
      "'blue ray_sessions.gz'\t       macbook_sessions.gz\n",
      " bluray_sessions.gz\t       movies.tgz\n",
      " cooking\t\t       nook_sessions.gz\n",
      " devops\t\t\t       normed_judgments.txt\n",
      " dryer_sessions.gz\t       predictor_deltas.npy\n",
      " feature_data.npy\t       retrotech\n",
      " headphones_sessions.gz        scifi\n",
      " health\t\t\t      'star trek_sessions.gz'\n",
      " ipad_sessions.gz\t      'star wars_sessions.gz'\n",
      " iphone_sessions.gz\t       tmdb.json\n",
      " jobs\t\t\t      'transformers dark of the moon_sessions.gz'\n",
      " judgments.tgz\t\t       travel\n",
      " kindle_sessions.gz\n"
     ]
    }
   ],
   "source": [
    "!ls data/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'blue ray_sessions.gz'\t 'lcd tv_sessions.gz'\n",
      " bluray_sessions.gz\t  macbook_sessions.gz\n",
      " dryer_sessions.gz\t  nook_sessions.gz\n",
      " headphones_sessions.gz  'star trek_sessions.gz'\n",
      " ipad_sessions.gz\t 'star wars_sessions.gz'\n",
      " iphone_sessions.gz\t 'transformers dark of the moon_sessions.gz'\n",
      " kindle_sessions.gz\n"
     ]
    }
   ],
   "source": [
    "!ls retrotech/sessions/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Up next: [A/B Testing Simulation to Active Learning](1.ab-testing-to-active-learning.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
